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Abstract — We introduce Entropy Rank and Free Energy Rank, 
two methods to classify pages of the Web according to interest. 
These variants of PageRank are based on Ruelle's thermody- 
namic formalism. They exhibit features from both PageRank 
and HITS methods. 

I. Introduction 

The World Wide Web can be modelled as a graph: pages 
are vertices, edges are hyperlinks. Google and other Web 
search engines attribute to each page of the Web a 'PageRank' 
score, which measures how well-connected this page is with 
respect to other pages [2]. More specifically, a page has a 
high PageRank if pointed to by pages with a high PageRank. 
Kleinberg [6] has proposed the HITS method, where a page 
is a good 'hub' on a topic if it points to good 'authorities' on 
this topic, and a page is a good authority if pointed at by good 
hubs. Other variants have been proposed by several authors, 
let us mention only Ding et al. [4], who propose a framework 
generalizing HITS and PageRank and Akian et al. [1], who 
use thermodynamic concepts in a different way from us. 

In this paper we apply methods of dynamical systems theory 
and statistical physics to the field of large graphs, and in 
particular we introduce Entropy Rank and Free Energy Rank 
methods, which rank pages of a Web graph. Their basic idea 
is to rank paths rather than just vertices, with a probability law 
given by thermodynamic principles. The last section discusses 
about the limitations and possible extensions of the Entropy 
Rank and Free Energy Rank. 

In the next two subsections we briefly recall the PageRank 
method; Entropy Rank and Free Energy Rank methods are 
outlined. Sections [TT] and [III] provide formal definitions and 
results. 

A. PageRank: First approach 

First, let us normalize every row of the adjacency matrix of 
the Web graph in such a way that it sums to 1. This is possible 
if every page contains at least one hyperlink. Then the resulting 
manix is row-stochastic and the Web graph is interpreted as a 
Markov chain. We consider a surfer that moves from page 
to page following the hyperlinks, choosing randomly with 
uniform distribution a hyperlink on the current page. The 
PageRank [2] can be defined as the stationary distribution 



on the vertices of the graph: a page has a high PageRank 
if it is visited often by the random surfer. This distribution 
is computed as the dominant left eigenvector of the row- 
normalized adjacency matrix. 

Some problems arising with this definition are that 

• it assumes that all pages contain hyperlinks, which is not 
true in practice; 

• it gives a zero PageRank to pages to which no page 
points, although these pages might be interesting as 
'hubs' (i.e., they point to interesting pages); 

• the stationary distribution is not unique, e.g., if the graph 
is not connected. 

B. PageRank: Improved approach 

To overcome these problems, the possibility is given to the 
random surfer, with some probability < 1 — a < 1, to jump 
to any other page of the web (with uniform distribution). The 
surfer follows a hyperlink of the current page with probability 
a. 

Let A be the adjacency matrix of the graph, with every 
non-zero row normalized to 1. Then the stochastic matrix M 
describing the Markov chain is constructed as follows. Let 
e be the vector of all ones, normalized in order to sum to 
one. The ith row is equal to (1 — a)e T + ctAi if Ai (the 
ith row of A) is non-zero. If Ai = then the ith row is 
taken as e T . The left dominant eigenvector of this matrix M, 
normalized in order to sum to one, gives the unique stationary 
distribution on the vertices. The PageRank is now defined as 
this stationary distribution. Note that in practice, the entries of 
e are not necessarily all equal but can be chosen in order to 
favor some pages. 

If a tends toward 1, then we recover the first approach 
above. If a tends towards 0, then the stationary distribution 
tends towards the uniform distribution. 

However we might argue that this definition still has some 
questionable aspects. Indeed, take the graph of Figure Q] 
Vertices 1, 2, 3 and 4 form a complete directed subgraph, 
hence they concentrate most of the probability, for values 
of a close to one, as expected. But they attribute an equal 
probability to 6 and 8, as we can get easily convinced. 
This might be argued as intuitively undesirable, because 8 is 
obviously a better page than 6: it directly points to the most 
interesting pages. 
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Fig. 1. Ranking scores according to different methods are computed on 
this graph. Vertices 6 and 8 have the same PageRank, whatever value of a is 
chosen, while both Entropy Rank and Free Energy Rank are able to distinguish 
them. The gap of Entropy Rank between the best vertices and worst vertices 
is larger than for any other method. 



C. Introduction to Entropy Rank 

We now briefly describe a variant of the PageRank that we 
call the Entropy Rank method, based on ideas from ergodic 
theory. We follow the references [3], [7], [8]. Section [TT] 
explains the algorithm in full depth. 

Given the Web graph, we choose probabilities on the edges 
in such a way that the resulting Markov chain has a maximum 
entropy rate (defined below). In other words, the behavior of 
a surfer choosing every edge with these probabilities at every 
step is maximally unpredictable: even knowing the vertex 
currently visited, one cannot predict much about the vertex 
visited next (at least on average). Equivalently, all paths of 
same (large) length are approximately equiprobable, so one 
cannot make a good guess about the path that will actually be 
followed. If we consider an army of random surfers instead of 
just one, the probabilities are chosen in such a way that the 
army is maximally dispersing in the graph, exploring every 
path with an (approximately) equal probability. 

More generally, we consider probabilities of transition that 
depend on the past pages visited by the surfer (loosing the 
Markov property); however, allowing such a memory of the 
past is not needed to reach the optimum, for it turns out that 
there always exists an optimal set of transition probabilities 
that is Markovian. 

We recall the definition of the entropy rate (or Kolmogorov- 
Sinai entropy): we take the Shannon entropy of all paths of 
length t, divided by t, taken to the limsup as t — > oo. The 



Shannon entropy of a random variable X taking finitely many 
values is — E^Probpf = x) logProb(X = x). The logarithm 
is taken in base e. 

The stationary distribution induced by the Markov chain is 
unique if the graph is strongly connected, and is called the 
Entropy Rank. It is easily computed from the left and right 
eigenvectors of the adjacency matrix, as proved by Parry [8] 
and detailed in Section Ull 

On the example of Figure [U we observe that the vertex 
6 has an Entropy Rank around thirty times lower than the 
vertex 8, which indicates that the page 8 is more interesting. 
It illustrates the fact that with the Entropy Rank, a page is 
good if good pages point to it or it points to good pages. 

Also, in the same example, we see that the gap of prob- 
abilities between good vertices and poor vertices is higher 
than given by the PageRank method (at least two orders of 
magnitude instead of one). 

D. Introduction to Free Energy Rank 

The Entropy Rank has the same drawbacks than the first 
approach to the PageRank: it is a meaningful quantity if 
the graph is strongly connected but otherwise can attribute 
a probability zero to some vertices, for instance those with 
zero indegree or zero outdegree. The following modification, 
which is a particular case of Ruelle's so-called thermodynamic 
formalism [9], overcomes this problem in a similar way to the 
improved version of PageRank. 

We allow any transition from any vertex to any vertex, but 
edges that are not in the graph are pondered with an energy 
U = — e < 0, while the edges of the Web graph have energy 
[7 = 0. We now look for probabilities of transitions such that 
the resulting Markov chain maximizes the quantity S + U, 
where S is the entropy and U is the expected energy for the 
stationary distribution of the Markov chain. The maximum 
value of S + U is analogous to what is called 'free energy' in 
thermodynamics (with unit temperature and up to the sign). 
It is also called 'topological pressure' in the literature of 
thermodynamic formalism. Intuitively, the surfer has to find a 
balance between being as unpredictable as possible and taking 
the edges with zero energy as often as possible. 

The stationary distribution of this Markov chain is always 
unique and is positive on every vertex. We define the Free 
Energy Rank to be this distribution. If e — > oo then we 
find back the Entropy Rank. If e — ► then the stationary 
distribution is uniform over the edges. 

II. Entropy Rank 

We now provide the formal definitions in their full gen- 
erality. Let A be the adjacency matrix of a simple directed 
graph. An infinite path is an infinite sequence of edges such 
that the terminal vertex of every edge is the initial vertex of 
the next edge. An infinite path has an initial vertex but no 
terminal vertex. A finite path is a finite such sequence, and 
has an initial vertex and a terminal vertex. 

Consider a user surfing randomly on the Web, clicking at 
every step on a hyperlink. Instead of choosing any hyperlink 
on the page with equal probability (like for the PageRank), 
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she chooses each hyperlink with a probability that possibly 
depends on the whole history (list of visited pages up to 
now). We want these probabilities to be such that the surfing 
is maximally 'unpredictable'. 

Formally, we want a probability distribution over finite paths 
of length t, for any t > 0. This is interpreted as the probability 
that the finite path is actually followed by the random surfer 
from any given time. We are looking for an invariant, or 
stationary, distribution which means that the probability for 
the random surfer to be at time s in a certain vertex and to 
follow a certain finite path from this vertex, is independent 
from the time s of observation. Of course this distribution on 
paths has to satisfy the following constraint: the probability of 
eoei . . . et is the sum over et+i of all probabilities of paths 
eoei . . . etet+i- We want furthermore the entropy rate of this 
distribution (defined in Section H-Cb to be maximal. 

In more concise terms, we are looking for a shift-invariant 
probability measure on the set of infinite paths with maximal 
entropy rate. The set of infinite paths is endowed with the 
usual er-algebra of measurable sets, generated by all sets of 
infinite paths extending a given finite path [3], [7]. The shift 
map is the map that sends the infinite path eoeie2e3 ... to 
eie2e3. ... A measure is shift-invariant if the inverse image 
of any measurable set by the shift is a set of same measure. 

The Shannon entropy of a probability distribution over a set 
of N elements is at most log N, and the uniform distribution 
is the only distribution to achieve this bound, as well-known. 
Now consider a probability distribution that is uniform up to 
a factor of a, meaning that the probability of any event is at 
most a/N. Then the Shannon entropy of this distribution is 
the convex combination of terms — logProb(AT = i), every of 
which is at least log TV — log a. Hence the Shannon entropy 
itself is at least logiV — log a. 

For any probability distribution over the paths, the 
Shannon entropy of paths of length t is at most 
log#{paths of length t}. Hence the entropy rate is at most 
limsup^oQ lQS #{P aths t ° f length U m This last quantity is called 
the topological entropy of the graph, because it is not depen- 
dent on any particular probability distribution, but is intrinsic 
to the graph. Since the number of paths of length t is the 
sum of all entries of A 1 , the topological entropy is readily 
seen to be equal to the logarithm of the spectral radius of the 
adjacency matrix A. 

Now, following Parry [8], we exhibit a particular probability 
distribution with this property. This distribution happens to 
have the Markov property: the random surfer's transition 
to a next vertex with a probability only depending on the 
current vertex, not on the previous history. Let A be a 
positive eigenvalue of A of maximal magnitude, u be a 
nonnegative left eigenvector for A and c be a nonnegative 
right eigenvector for A. We thus have u T A = Xu T and 
Av = Xv. Their existence is ensured by Perron-Frobenius 
theorem, and they can be computed by the power method. 
Normalize u such that £. m = 1, and normalize v such 
that £. u t v i = 1- Choose the probability to take the existing 
edge knowing that we start from i, to be Vj/Xvi. 

This is indeed a probability distribution over the outgoing 
edges of i, since £j:(i,j)is an edge V j/ Xv i = ^~ 1 (Av) i /v l = 



1. This turns the graph into a Markov chain. Call M the 
stochastic matrix encoding the probability transitions. Then 
M = A _1 diag(u)~ 1 74 diag(u), where diag(w) is the diagonal 
matrix formed from vector v. 

The distribution attributing a probability = UjWj to vertex 
i is an invariant distribution on the vertices of the Markov 
chain. Indeed, p T M = p T X~ 1 diag(v)~ 1 A diag(v) = 
u T X~ 1 A diag(w) = u T Amg(v) ~p T . 

This Markov chain endowed with an invariant distribution 
on vertices yields an invariant distribution on paths. The prob- 
ability of path is UiViX~ 1 Vj /wj = X~ 1 UiVj, the probabil- 
ity of path (i,j)(j,k) is X~ 1 mvj X~ l Vk/vj = A~ 2 Uj«fc, and 
more generally any path of length t going from vertex i to 
vertex j has a probability X~*"UiVj (which does not depend on 
the intermediate vertices). We know that the number of paths 
of length t is in the order of A* (up to a factor). Hence the 
probability distribution over paths of fixed length is uniform up 
to a factor. The Shannon entropy of paths of length t therefore 
grows as t log A, up to an additive constant. The entropy rate 
of this distribution is thus log A, which is optimal. 

In brief, we have proved the following facts: 

• the behavior of a random surfer with maximal entropy 
rate can be computed from a left and right nonnega- 
tive dominant eigenvector, for instance with the power 
method; 

• the resulting distribution on vertices is given by the 
componentwise product of the two eigenvectors; 

• this optimal random surfer need not have a memory of 
the past (Markov property). 

Definition 1: The Entropy Rank of vertex i of an un- 
weighted graph is defined as the probability UiVi, where u 
(v) is the left (right) dominant eigenvector of the adjacency 
matrix. 

If the graph is strongly connected, then A, u and v are 
unique and positive, again by Perron-Frobenius theorem. Then 
the matrix X~~* A* can be shown to converge to vu T , whose 
diagonal gives the vertex probability distribution. Actually, as 
shown in [8], when the graph is strongly connected there is no 
other probability distribution that maximizes the entropy rate. 
The Entropy Rank is then uniquely defined and non-zero on 
every vertex. See an example on Figure Q] 

Let us mention some other examples. For the complete 
graph on n vertices, then A is the matrix of ones (except 
on the diagonal), and we see that the entropy has the maximal 
value log(n — 1) for the uniform distribution. If the graph is 
the union of two complete graphs of different sizes, then the 
optimal probability distribution is concentrated on the larger 
component, thus the Entropy Rank of a whole component is 
zero. If the graph is the union of two copies of the same graph, 
then the probability distribution can be shared between the two 
copies in an arbitrary way. 

Note also that if we reverse all edges of the graph, then the 
matrix A is replaced by A T , the vectors u and v switch their 
roles and the final value for the Entropy Rank is the same. 
Hence the entropy method takes into account, not only the 
paths leading to a vertex, but also the paths issued from a 
vertex. 
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III. Free Energy Rank 

We want a method giving to every graph a unique ranking 
score, which is non-zero on every vertex. That is why we 
add the following improvement, which is a particular case 
of Ruelle's thermodynamic formalism [9]. On the complete 
directed graph with self-loops that extends the original graph 
we attribute an 'energy' U = to the edges of the original 
graph and an 'energy' U = — e < to the other edges. Now 
consider the set of all paths in the complete graph. The energy 
of a path is the energy of its first edge. On this set we want 
to put an invariant probability measure that maximizes the 
quantity S + U, where S is the entropy rate and U is the 
expected energy for the probability measure. The maximum 
of this quantity is analogous to what is called 'free energy' in 
thermodynamics (with unit temperature and up to the sign). 
It is also called 'topological pressure' in the literature of 
thermodynamic formalism. 

This time we consider the matrix B such that Bjj = 
exp(EZy), where Uij is the energy of the edge ij. Note that 
if e — > oo, then B converges to the adjacency matrix A. Note 
also that the matrix B can be obtained from A by replacing 
zero entries with e _e . 

It is possible to see that the maximizing probability dis- 
tribution exists and is unique, and we can compute it in the 
following way. Let X,u,v be such that 

• A is the dominant eigenvalue of B; 

• u T B = Xu T (left eigenvector); 
> Bv = \v (right eigenvector); 

■ u > o, J2i u i = i; 

• v > o, J2 u i v i = i; 

These objects exist and are unique, by Perron-Frobenius 
theorem. 

Now, we claim that the maximizing measure gives a prob- 
ability of UiVi to be in vertex i. It also gives a probability of 
\~ 1 UijVj jvi for the transition i — > j, of energy Uij. And this 
probability measure is again Markovian. These claims can be 
derived as corollaries to Ruelle's more general results [9], but 
we prefer to give an elementary argument. 

Definition 2: The Free Energy Rank is the probability itjU-s. 

Definition 3: For a given e > 0, the Free Energy Rank 
of vertex i of an unweighted directed graph is defined as 
the probability UiVi, where u (v) is the left (right) dominant 
eigenvector of the matrix B obtained from the adjacency 
matrix by replacing the zero entries with e~ e . 

The proof of the claim, which we give for the sake of 
clarity, relies on the following result, well-known in statistical 
physics; see for instance [9]. Given a finite set endowed with 
a real-valued energy function, the only distribution that maxi- 
mizes the free energy (sum of Shannon entropy and expected 
energy) is the Boltzmann distribution, attributing probability 
exp(fTi)/ cx p(Ui) to element i. The free energy is then 
l°gEi cx p(£A)- If a distribution is the Boltzmann distribution 
up to a factor a, meaning that the probability for element i is 
at most aexp([/j)/ ^\ exp(C/j), then the corresponding free 
energy is at least logj^ exp(E/j) — log a. 

The Markov chain described just above gives a probability 
A~* exp(^) Uki)uiVj to a path of length t from vertex i to 



vertex j, where Y^, Um is the sum of energies of all edges (k, I) 
on the path. This has the form of a Boltzmann distribution, up 
to a factor. Now if we give to a path of length t a 'path energy' 
that is the sum of all energies of its t individual edges, then 
this probability distribution yields a 'path free energy' equal 

t0 lo gE P aths of length t ex P(E^feO U P to an additive constant 
(independent of t), which is almost maximal. This path free 
energy, divided by t, gives for t — > oo a maximal S + U. 
Note that the expected energy of a path of length t is exactly 
tU, since the distribution is invariant and the expectation is 
linear. Note also the maximal free energy is again log A, the 
logarithm of spectral radius of B. 

The interpretation of this framework is the following: a 
random surfer can jump from any page to any page, with an 
energy cost of e if no hyperlink is present between the pages. 
The surfer, whose aim is to optimize the free energy S + U, is 
therefore incited to follow hyperlinks (edges of the graph) in 
priority. If the energy gap e is 0, then the optimal probability 
is uniform. If the energy gap is high, then the surfer is incited 
to follow hyperlinks most of the time. Such a phenomenon is 
similar to what is observed when varying the factor a between 
in 1 in the PageRank method (as detailed in Section II-BI) . 
The free energy method also gives a non-zero probability to 
any vertex of the graph. An example of calculation is shown 
on Figure Q] 

Again, the Free Energy Rank is invariant under reversal of 
edges. 

IV. Limitations and extensions 

There are two ways to get a high Entropy Rank or Free 
Energy Rank: to be pointed by good pages or to point to 
good pages. This is reminiscent of HITS method [6], that 
computes a hub score and authority score for every node 
from the dominant eigenvectors of AA T and A T A. The exact 
relation between HITS and the entropy method remains to be 
investigated. 

Used as a substitute to PageRank to the full web graph, 
Entropy/Free Energy Rank method can be falsified: to increase 
the score of a page, it helps to point to many good pages. 
Used as in HITS method, i.e., on a subgraph of all pages that 
contain a certain keyword, a page that points to many good 
pages is a good hub indeed, hence deserves a good score. This 
indicates however that it might be more indicated to compute 
the Entropy/Free Energy Rank, not the full web graph but on 
a subgraph of pages containing a specific keyword or directly 
linked to such a page, like in the HITS method. 

The energies on the edges can be chosen to be unequal, 
in order to favor some pages, just as for the PageRank (see 
Section II-Bb . 

Ruelle's thermodynamic formalism allows to put an energy, 
not only on the edges, but the paths. In other words, we can 
define an energy function that depends on a whole path and not 
only on the first edge. We do not know whether this possibility 
could find applications in the field of large graphs. 

To compute the vectors u and v for the Entropy /Free Energy 
Rank, one can use the power method, whose rapidity of 
convergence is determined by the spectral gap of A or B. 



The spectral gap of a matrix is the ratio of the magnitudes 
of the first and second eigenvalue. In the PageRank method, 
the rapidity of convergence is given by the spectral gap of a 
stochastic matrix aAI + (1 — a) A. 

Instead of maximizing S or S + U, we could also maximize 
the quantity (S + U + spectral gap of the resulting matrix B) 
over e and the probability transitions. Indeed, we want to 
converge as quickly as possible. 

The entropy and free energy methods that we propose are 
not limited to the computation of rankings, but potentially 
apply any time that we need to transform a graph into a 
Markov chain. We only mention two examples. 

For instance, the following definition of distance between 
the vertices of an undirected graph has been proposed [5]. 
We transform the graph into a Markov chain just as in the 
PageRank method: from every node, choose an outgoing edge 
with equal probability. Then the distance from vertex x to 
vertex y is d xy + d yx , where d xy is the average first-passage 
time from x to y. Instead of taking this Markov chain, we can 
choose the Markov chain that maximizes entropy, like in the 
Entropy Rank method. This would lead to another definition 
of distance. 

Markov chains are also useful in the field of graph clus- 
tering: see for instance van Dongen's Markov clustering al- 
gorithm [10]. A variant to van Dongen's algorithm using the 
entropy-maximizing Markov chain can be investigated. 
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